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Measure- valued Markov chains have raised interest in Bayesian nonparametrics since the seminal 
paper by {Math. Proc. Cambridge Philos. Soc. 105 (1989) 579-585) where a Markov chain 
having the law of the Dirichlet process as unique invariant measure has been introduced. In 
the present paper, we propose and investigate a new class of measure- valued Markov chains 
defined via exchangeable sequences of random variables. Asymptotic properties for this new 
class are derived and applications related to Bayesian nonparametric mixture modeling, and 
to a generalization of the Markov chain proposed by [Math. Proc. Cambridge Philos. Soc. 105 
(1989) 579-585), are discussed. These results and their applications highlight once again the 
interplay between Bayesian nonparametrics and the theory of measure- valued Markov chains. 
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1. Introduction 

Measure- valued Markov chains, or more generally measure- valued Markov processes, arise 
naturally in modeling the composition of evolving populations and play an important 
role in a variety of research areas such as population genetics and bioinformatics (see, 
e.g., [5, 9, 10; 26]), Bayesian nonparametrics [31, 38], combinatorics [26] and statistical 
physics [5, 6, 26]. In particular, in Bayesian nonparametrics there has been interest in 
measure- valued Markov chains since the seminal paper by [12], where the law of the 
Dirichlet process has been characterized as the unique invariant measure of a certain 
measure- valued Markov chain. 

In order to introduce the result by [12], let us consider a Polish space X endowed with 
the Borel a-field ^ and let "Px be the space of probability measures on X with the tr- 
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field generated by the topology of weak convergence. If a is a strictly positive finite 
measure on X with total mass a > 0, K is a X-valued random variable (r.v.) distributed 
according to ag o./a and is a r.v. independent of Y and distributed according to 
a Beta distribution with parameter (1, a) then, Theorem 3.4 in [33] implies that a Dirichlet 
process P on X with parameter a uniquely satisfies the distributional equation 

P = 96y + (1 - e)P, (1) 

where all the random elements on the right-hand side of (1) are independent. All the 
r.v.s introduced in this paper are meant to be assigned on a probability space (17, ^,P) 
unless otherwise stated. In [12], (1) is recognized as the distributional equation for the 
unique invariant measure of a measure- valued Markov chain {Pm,m > 0} defined via the 
recursive identity 

P,n = O^Sy^ + (1 - em)Pm-i , m > 1, (2) 

where Pq G Vx is arbitrary, {Yrmm > 1} is a sequence of X-valued r.v.s indepen- 
dent and identically distributed as Y and {9m, m > 1} is a sequence of r.v.s, inde- 
pendent and identically distributed as 6 and independent of {Ym,'m> 1}. Wc term 
{PrnjiTi > 0} as the Feigin-Tweedie Markov chain. By investigating the functional 
Markov chain {Gm,™ > 0}, with Gm /x5(^)^m('i^) for any m > and for any mea- 
surable linear function g : X i— > R, [12] provide properties of the corresponding linear 
functional of a Dirichlet process. In particular, the existence of the linear functional 
G := J-^g{x)P{dx) of the Dirichlet process P is characterized according to the condition 
/jjlog(l -|- \g{x)\)a{dx) < +oo; these functionals were considered by [16] and their exis- 
tence was also investigated by [7] who referred to them as moments, as well as by [39] 
and [4]. Further developments of the linear functional Markov chain {GmiTi > 0} are 
provided by [15, 17] and more recently by [8]. 

Starting from the distributional equation (1), a constructive definition of the Dirichlet 
process has been proposed by [33]. If P is a Dirichlet process on X with parameter 
a = aao, then P = X]i<i<oo?'''^i'i where {Yi,i > 1} is a sequence of independent r.v.s 
identically distributed according to ao and {pi-,i > 1} is a sequence of r.v.s independent 
of {Yi,i > 1} and derived by the so-called stick breaking construction, that is, pi = wi 
and Pi ~ Wi ni<j<j-i(l ~ "^j) * > 1, with {wi, i > 1} being a sequence of independent 
r.v.s identically distributed according to a Beta distribution with parameter (l,a). Then, 
equation (1) arises by considering 

oo 

P=PiSy^ + (1 - wi)'^piSYi, 

i-2 

where now p2 — W2 and pi = Wi Y[2<j<i-i(^ ^ ^j) * > 2. Thus, it is easy to sec that 
P := 'Yli2<i<oo Pi^Yi is also a Dirichlet process on X with parameter a and it is independent 
of the pairs of r.v.s (pi, Yi). If we would extend this idea to n initial samples, we should 
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consider writing 



i=l 




where 9 = Xli<i<nPj = ^ ~ ni<i<n(l ~ ^i) ^"^^ P is a Dirichlet process on X with pa- 
rameter a independent of the random vectors and {Yi, . . . ,Yn). However, 
this is not an easy extension since the distribution of 9 is unclear, and moreover 9 and 
Si<i<n(-Pi/^)^'i'i are not independent. For this reason, in [11] an alternative distribu- 
tional equation has been introduced. Let a be a strictly positive finite measure on X 
with total mass a > and let {Yj,j > 1} be a X- valued Polya sequence with parameter a 
(see [2]), that is, {Yj.j > 1} is a sequence of X- valued r.v.s characterized by the following 
predictive distributions 



and P(Yi € A) = a{A)/a, for any A S . The sequence {Yj^j > 1} is exchangeable, that 
is, for any j > 1 and any permutation a of the indexes (1, . . . , j), the law of the r.v.s 
{Yi, . . . ,Yj) and (Fg-i-i) , . . . , F^.^)) coincide; in particular, according to the celebrated de 
Finetti representation theorem, the Polya sequence {Yj,j > 1} is characterized by a so- 
called de Finetti measure, which is the law of a Dirichlet process on X with parameter a. 
For a fixed integer n > 1, let {q'^\ ■ ■ ■ ,ql^^) be a random vector distributed according to 
the Dirichlet distribution with parameter (1, . . . , 1), J2i<i<n ~ 1' ^^'^ be a r.v. 
distributed according to a Beta distribution with parameter {n,a) such that {Yi,i > 1}, 

{q^\ . • . , 'Zn"'') and 9 are mutually independent. Moving from such a collection of random 
elements. Lemma 1 in [11] implies that a Dirichlet process P(") on X with parameter a 
uniquely satisfy the distributional equation 



where all the random elements on the right-hand side of (3) are independent. In order 
to emphasize the additional parameter n, we used an upper-script (n) on the Dirich- 
let process P and on the random vector (gj"' , . . . , qi"^ ) . It can be easily checked that 
equation (3) generalizes (1), which can be recovered by setting n = 1. 

In the present paper, our aim is to further investigate the distributional equation (3) 
and its implications in Bayesian nonparametrics theory and methods. The first part of 
the paper is devoted to investigate the random element X]i<i<n ^["^'^i'i (3) which is 
recognized to be the random probability measure (r.p.m.) at the nth step of a measure- 
valued Markov chain defined via the recursive identity 




n 




(3) 



n 



n-1 




(4) 
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where {Wn,n > 1} is a sequence of independent r.v.s, each Wn distributed according 
a Beta distribution with parameter (1, n — 1), g-"^ = Wi ni+i<j<n(l for z = 1, . . . ,n 

and n>l and the sequence {Wn, n > 1} is independent from {Yn,n> 1}. More generally, 
we observe that the measure- valued Markov chain defined via the recursive identity (4) 
can be extended by considering, instead of a Polya sequence {Yn,n > 1} with parame- 
ter a, any exchangeable sequence {Zmn> 1} characterized by some de Finetti measure 
on Vx and such that {M^„,7i > 1} is independent from {Z„,n > 1}. Asymptotic proper- 
ties for this new class of measure- valued Markov chains are derived and some linkages to 
Bayesian nonparametric mixture modelling are discussed. In particular, wc remark how 
it is closely related to a well-known recursive algorithm introduced in [25] for estimating 
the underlying mixing distribution in mixture models, the so-called Newton's algorithm. 

In the second part of the paper, by using finite and asymptotic properties of the 
r.p.m. J2i<i<n it^^ and by following the original idea of [12], we define and investigate 
from (3) a class of measure- valued Markov chain {Pm^rn > 0} which generalizes the 
Feigin-Tweedie Markov chain, introducing a fixed integer parameter n. Our aim is in 
providing features of the Markov chain {PrnKm > 0} in order to verify if it preserves 
some of the properties characterizing the Feigin-Tweedie Markov chain; furthermore, we 
are interested in analyzing asymptotic (as m goes to -l-oo) properties of the associated 
linear functional Markov chain {Gm "* , m > 0} with G^^' J^g{x)P!n\dx) for any m > 
and for any function g : X ^ M such that /glog(l -I- \g{x)\)a{dx) < +oo. In particular, we 
show that the Feigin-Tweedie Markov chain {Pm, m > 0} sits in a larger class of measure- 
valued Markov chains {Pm \ m > 0} parametrized by an integer number n and still having 
the law of a Dirichlet process with parameter a as unique invariant measure. The role of 
the further parameter n is discussed in terms of new potential applications of the Markov 
chain {Pm\rn > 0} with respect to the the known applications of the Feigin-Tweedie 
Markov chain. 

Following these guidelines, in Section 2 we introduce a new class of measure-valued 
Markov chains {Qn, n>l} defined via exchangeable sequences of r.v.s; asymptotic results 
for {QmTT- > 1} are derived and applications related to Bayesian nonparametric mixture 
modelling are discussed. In Section 3, wc show that the Feigin-Tweedie Markov chain 
{PrnjiTi > 0} sits in a larger class of measure- valued Markov chains {Pm\m > 0}, which 
is investigated in comparison with {Pm,Tn > 0}. In Section 4. some concluding remarks 
and future research lines are presented. 

2. A class of measure- valued Markov chains and 
Newton's algorithm 

Let {Wn,n > 1} be a sequence of independent r.v.s such that Wi = 1 almost surely 
and Wn has Beta distribution with parameter (l,n — 1) for n > 2. Moreover, let 
{Zn,n > 1} be a sequence of X-valued exchangeable r.v.s independent from {Wn,n > 1} 
and characterized by some de Finetti measure on T'x- Let us consider the measure- valued 
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Markov chain {Q,i,7i > 1} defined via the recursive identity 

Qn^WnSz„+{l-Wn)Q„-l, n>l. (5) 

In the next theorem, we provide an alternative representation of Q„ and show that Qn {'^) 
converges weakly to some limit probability Q{lo) for almost all a; CE fi, that is, for each lo 
in some set Ag ^ with ¥{A) = 1. In short, we use notation Qn => Q a.s.-P. 

Theorem 1. Let {Qn,n > 1} be the Markov chain defined by (5). Then: 

(i) an equivalent representation of Qn, n ^ 1,2, . . . is 

Qn=E9l"^'5z„ (6) 

i=l 

where X]i<j<Ti9i"^ ~ ^' 9^"^ ~ iQi"\ ■ ■ ■ , Qri'^ ) ^'^•5 Dirichlet distribution with pa- 
rameter {1,1, ... ,\) , and {q^"'\n>l} and {Zn,n>l} are independent. 

(ii) There exists a r.p.m. Q on (f2,^,P) such that, as n— >-+oo, 

Qn^Q, a.s.-P, 
where the law of Q is the de Finetti measure of the sequence {Zn,n > 1}. 

Proof. As far as (i) is concerned, by repeated application of the recursive identity (5), 
it can be checked that, for any n>l, 

n n 
i—l j— i+1 

where Wi — 1 almost surely and ni+i<j<ri(l ~ ^j) defined to be 1 when i = n. 

Defining q^"'^ := J|j^j^<^^,^(l — Wj), i ~ l,...,n, it is straightforward to show that 

= (gj" g^"' ) has the Dirichlet distribution with parameter (1,1,...,!) and 

J2i<i<nl'l"^ = 1' so that (6) holds. 

Regarding (ii), by the definition of the Dirichlet distribution, an equivalent represen- 
tation of (6) is 

i=i 1=1 ^j=i 

where {A„, n > 1} is a sequence of r.v.s independent and identically distributed according 
to standard exponential distribution, independent from {Z„,n>l}. Let g : X R be any 
bounded continuous function, and consider 
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The expression in the denominator converges almost surely to 1 by the strong law of 
large numbers. As far as the numerator is concerned, let Q be the r.p.m. defined on 
(0,^,P), such that the r.v.s > 1} are independent and identically distributed 

conditionally on Q; the existence of such a random element is guaranteed by the de 
Finetti representation theorem (see, e.g., [32], Theorem 1.49). It can be shown that 
{Xng{Zn),'n > 1} is a sequence of exchangeable r.v.s and, if ii, . . . ,t„ G M, 

F(Aig(Zi) <ti,...,A„.g(Z„) <i„) 

= / P(Ai.9(Zi)<ti,...,A„.g(Z„)<t„|Ai,...,A„)[|c-^'dA, 

J(0, + oo)" ,^1 



1=1 



where Q*{A,ijj) := P{g~-^{A),uj), lu G il,, A G is a. r.p.m. with trajectories in Vm, 
and -Fq. denotes the random distribution relative to Q*. This means that, conditionally 
on Q* , {Xng{Zn),n > 1} is a sequence of r.v.s independent and identically distributed 
according to the random distribution (evaluated in t) 

Of course, E|Ai(7(Zi)I = E(Ai)E|g(Zi)| < +oo since g is bounded. As in [3], Example 7.3.1, 
this condition implies 

i J2 ^^9{Zi) E(Aig(Zi)|Q*) ^ J^td(^£^ Fq, (^-^ e"^ dy 

= / uQ*{Au)^ ( g{x)Q{dx), 

SO that Gn —5- J^g{x)Q{dx) a.s.-P. By Theorem 2.2 in [1], it follows that Qn => Q a.s.-P 
as n — > +00. □ 



Throughout the paper, a denotes a strictly positive and finite measure on X with total 
mass a, unless otherwise stated. If the exchangeable sequence {Z„,n > 1} is the Polya 
sequence with parameter a, then by Theorem l(i) n > 1} is the Markov chain defined 
via the recursive identity (4); in particular, by Theorem l(ii), Qn =^ Q a.s.-P where Q 
is a Dirichlet process on X with parameter a. This means that, for any fixed integer 
n > 1, the r.p.m. Qn can be interpreted as an approximation of a Dirichlet process with 
parameter a. In Appendix A.l, we present an alternative proof of the weak convergence 
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(convergence of the finite dimensional distribution) of {(5,i,n > 1} to a Dirichlet process 
on X with parameter a, using a combinatorial technique. As a byproduct of this proof, 
we obtain an explicit expression for the moment of order (ri, . . . , r^) of the fc-dimensional 
Polya distribution. 

A straightforward generalization of the Markov chain {Qn, n>0} can be obtained by 
considering a nonparametric hierarchical mixture model. Let fc : X x — >■ be a kernel, 
that is, k{x,'d) is a measurable function such that x i— k(x,'d) is a density with respect 
to some tr- finite measure A on X, for any fixed i? G 0, where is a Polish space (with the 
usual Borel cr-field). Let {Qn, > 1} be the Markov chain defined via (5). Then for each 
a: S X we introduce a real- valued Markov chain {fn'^\x),n > 1} defined via the recursive 
identity 

fP{x) = W,Mx,^n) + {l-Wr,)fl'l\{x), n>l, (7) 

where 

fP{x)= [ fc(x,^)Q„(d^). 
Je 

By a straightforward application of Theorem 2.2 in [1], for any fixed a; G X, when i— ^ 
k(x, 19) is continuous for all x G X and bounded by a function h{x), as n ^ +oo, then 

/('?)(x)^/(«)(a;):- / fc(x,i9)Q(di?), a.s.-P, (8) 

where Q is the limit r.p.m. in Theorem 1. For instance, if Q is a Dirichlet process on X 
with parameter a, /^"^^ is precisely the density in the Dirichlet process mixture model 
introduced by [19]. When h{x) is a A-intcgrable function, not only the limit f^{x) is 
a random density, but a stronger result than (8) is achieved. 

Theorem 2. If iS ^ k^x^'d) is continuous for all x and bounded by a X-integrable 
function h{x), then 

lim I \f';^\x)- f^^\x)\\{dx)^Q, a.s.-P, 
where Q is the limit r.p.m. in Theorem 1. 

Proof. The functions and/W)(x)= fc(x, z9)Q(di9), defined on X x 17, are 
measurable, by a monotone class argument. Li fact, by kernel's definition, (x, "Q) ^ k{x, d) 
is (g) i3(9)-mcasurable. Moreover, if fc = I^Ib, ^ G and B G B(8), then 

f^'^\x,u)^ J k{x,i9)Q{M;L,) ^1a{x)Q{B;uj) 

is ^® ^-measurable. Let C = {C G jr®i3(e): J lc{x,i})Q{d-d;Lo) is ^-measurable} 
Since C contains the rectangles, it contains the field generated by rectangles, and, since C 
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is a monotone class, € = (E)B{&). The assertion holds for J^'^) of the form 



JB 

since there exist a sequence of simple function on rectangles which converges pointwise 
to k. Therefore, A := {{uj,x)): fn'^\Lu,x) does not converge to /^"^'(Wjx)} £ ^ (g) 
Then, by Fubini's theorem, 

X{x: fl^\ijj,x) does not converge to J^*^^ (w, a;)}P(dw) 

lA(w,a;)A(d.T)P(dw) 

P{w: f'^\u),x) does not converge to J^'^\uj,x)}X{dx) 

OA(dx) =0. 



Hence, P(iJ) ~ 1 where iJ is the set of lu such that X{x: f!{^\u!,x) does not converge to 
/('?)(w,a;)} = 0. For any co fixed in if, it holds /^"^^(w, •) •), A-a.e., so that by 

the Scheffe's theorem we have 

lim / \fi^\uj,x)- f^^\uj,x)\X{dx)^0. 

The theorem follows since F{H) = 1. □ 

We conclude this section by remarking an interesting linkage between the Markov 
chain {Qmn > 1} and the so-called Newton's algorithm, originally introduced in [25] for 
estimating the mixing density when a finite sample is available from the corresponding 
mixture model. See also See also [24] and [23] . Briefly, suppose that Xi , . . . , X„ arc n 
r.v.s independent and identically distributed according to the density function 

J{x)^ [ k{x,d)Qidd), (9) 

JB 

where k{x, t?) is a known kernel dominated by a a-finite measure A on X; assume that the 
mixing distribution Q is absolutely continuous with respect to some cr-finite measure fj, 
on O. [23] proposed to estimate q~ dQ/dfi as follows: fix an initial estimate qi and a se- 
quence of weights wi,W2, ■ ■ ■ , Wn S (0, 1). Given Xi, . . . , Xn independent and identically 
distributed observations from /, compute 

jQk{xi,-&)qi-i{i})fi(d-&) 
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for « = 2, 3, . . . , n and produce g„ as the final estimate. We refer to [13, 20, 34], and [21] for 
a recent wider investigation of the Newton's algorithm. Here we show how the Newton's 
algorithm is connected to the measure- valued Markov chain {Qmn ^^}- 

Let us consider n observations from the nonparametric hierarchical mixture model, 
that is, Xil-di ^ k{-,di) and -dilQ ^ Q where Q is a r.p.m. If we observed > 1}, then 
by virtue of (ii) in Theorem 1, we could construct a sequence of distributions 

Q^=W^6^i^+{l-Wi)Q^-l, i^l,...,n 

for estimating the limit r.p.m. Q, where {Wi,i > 1} is a sequence of independent r.v.s such 
that Wi = 1 almost surely and Wi has Beta distribution with parameters — 1). This 
approximating sequence is precisely the sequence (5). Therefore, taking the expectation of 
both sides of the previous recursive equation, and defining Qi := E[(5i], vui = E[Wi] = 1/i, 
we have 

Qi^WiSi).+{l-Wt)Qi^i, i = l,...,ri, (10) 

which can represent a predictive distribution for and hence an estimate for Q. 

However, instead of observing the sequence {t9i,i > 1}, it is actually the sequence 
{Xi,i > 1} which is observed; in particular, we can assume that Xi, . . . , Xn are n r.v.s 
independent and identically distributed according to the density function (9). Therefore, 
instead of (10), we consider 

fi , fc(x,,'!?)Qi_i(dt9) . 

Q^{d'^) = (1 - W^)Q^-l{d'^) + W,—— — — , I = 1, ... ,71, 

where 6^- in (10) has been substituted (or estimated, if you prefer) by k{xi,'d)Qi^i{d'0) / 
Jq k{xi, i?)Qi_i(di?). Finally, observe that, if Qi is absolutely continuous, with respect to 
some cr-finite measure /i on 0, with density qi, for i = 1, . . . , n, then we can write 

N~ , k{xi,'d)q,^i{'d) 

qi{'d) = {l~Wi)q.,.i{i})+w^-^ ^ i = l,...,n, 11) 

jQk(Xi,-d)q^-i['d)fi{d-d) 

which is precisely a recursive estimator of a mixing distribution proposed by [23] when 
the weights are fixed to be Wi = 1/i for i = 1, . . . , n and the initial estimate is E[(5^j]. 



3. A generalized Feigin— Tweed ie Markov chain 

In this section our aim is to define and investigate a class of measure- valued Markov chain 
which generalizes the Feigin-Tweedie Markov chain introducing a fixed integer parame- 
ter n, and still has the law of a Dirichlet process with parameter a as the unique invariant 
measure. The starting point is the distributional equation (3) introduced by [11]; see Ap- 
pendix A. 2 for an alternative proof of the solution of the distributional equation (3). All 
the proofs of Theorems in this section are in Appendix A. 3 for the ease of reading. 
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For a fixed integer n > 1, let := {0,„,m > 1} be a sequence of independent r.v.s 
with Beta distribution witli parameter (n,a), q^"' := . . . , gm,™), "m- > 1}, with 

J2i<i<n 1m \ — 1 fo'^ w > 0, be a sequence of independent r.v.s identically distributed 
according to a Dirichlet distribution with parameter (1, . . . , 1) and Y := {(F„i,i, • • ■ , ^m,n), 
m > 1} be sequence of independent r.v.s from a Polya sequence with parameter a. Mov- 
ing from such collection of random elements, for each fixed integer n > 1 wc define the 
measure- valued Markov chain {Pm\m > 0} via the recursive identity 

n 

P^n^ = Ora ^ ^Y... + (1 " ^"0^™-^! , > 1, (12) 

i=l 

where Pq"^ S Vx is arbitrary. By construction, the Markov chain {Pm,m > 0} proposed 
by [12] and defined via the recursive identity (2) can be recovered from {Pm\'m > 0} 
by setting n ^ 1. Following the original idea of [12], by equation (12) we have defined 
the Markov chain {PmKm > 0} from a distributional equation having as the unique 
solution the Dirichlet process. In particular, the Markov chain {P,lr' , w > 0} is defined 
from the distributional equation (3) which generalizes (1) substituting the random prob- 
ability measure dy with the random convex linear combination X^KiXn ^i"'*^^; ' 

fixed positive integer n. Observe that^j^^^^ g|"''(5y; is an example of the r.p.m. Qn de- 
fined in (6) and investigated in the previous section, when {Zi} is given by the Polya 
sequence {Yi} with parameter a. In particular. Theorem 1 shows that Qn a. s. -converges 
to the Dirichlet process P when n goes to infinity; however here we assume a different 
perspective, that is, n is fixed. 

As for the case n = 1, the following result holds. 

Theorem 3. The Markov chain {PmK'm > 0} has a unique invariant measure 11 which 
is the law of a Dirichlet process P with parameter a. 

Another property which still holds in the more general case when n > 1 is the Harris 
ergodicity of the functional Markov chain {Gm\rn > 0}, under assumption (13) below. 
This condition is equivalent to the finiteness of the r.v. \g{x)\P{dx); see also [4]. 

Theorem 4. Let : X i— > M be any measurable function. If 

^log(l + |5(x)|)a(dx)<+oo, (13) 

then the Markov chain {Gm\rn > 0} is Harris ergodic with unique invariant measure Hg, 
which is the law of the random Dirichlet mean J^g{x)P{dx). 

We conclude the analysis of the Markov chain {Pm\rn > 0} by providing some results 
on the ergodicity of the Markov chain {Gm\m > 0} and by discussing on the rate of 
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convergence. Let X = K and let {P^\m > 0} be the Markov chain defined by (12). In 
particular, for the rest of the section, we consider the mean functional Markov chain 
{Mm\m > 0} defined recursively by 

n 

= ^™ E + (1 - , ™ > 1, (14) 

(n) 

where ' G M is arbitrary and n is a given positive integer. From Theorem 4, under 
the condition /jjlog(l + |x|)Q!(dx) < +oo, the Markov chain {Mm\m > 0} has the dis- 
tribution ^ of the random Dirichlct mean M as the unique invariant measure. It is not 
restrictive to consider only the chain {Mm \ m > 0}, since a more general linear function- 
als G of a Dirichlct process on an arbitrary Polish space has the same distribution as the 
mean functional of a Dirichlct process with parameter ag, where ctg{B) := a{g~^{B)) for 
any B G 

Theorem 5. The Markov chain {Mm\'m' > 0} satisfies the following properties: 

(i) {Mm\Tn>0} is a stochastically monotone Markov chain; 

(ii) if further 

= / \x\aoidx) < +00, (15) 



then {AlmKni >0} is a geometrically ergodic Markov chain; 
(iii) if the support of a is bounded then {A/^^m > 0} is an uniformly ergodic Markov 
chain. 

Recall that the stochastic monotonicity property of {AIm\m > 0} allows to consider 
exact sampling (see [27]) for M via {Mm\rn > 0}. 

Remark 1. Condition (15) can be relaxed. If the following condition holds 

mYis\"]= J \y\"aoidx) <+oo for some < s < 1, (16) 

then the Markov chain {AIm\m > 0} is geometrically ergodic. See Appendix A. 3 for the 
proof. If for instance, ao is a Cauchy standard distribution and a > 0, condition (16) 
is fulfilled so that {Mm\'m' > 0} will turn out to be geometrically ergodic for any fixed 
integer n. 

From Theorem 1, T^") ■=J2i<i<nl'f'^Yi = iM^iJ^iKiKnli"^ ^Y,)idx) converges in dis- 
tribution to the random Dirichlct mean Af as ti — > -f-oo; so it is clear that, for a fixed 
integer n, the law of T*^"^ approximates the law of M and that the approximation will 



12 



S. Favaro, A. Guglielmi and S.G. Walker 



be better for n large. If we reconsider (14), written as 

Mi") = 0™Ti") + (1 - e„,)M^l„ m > 1, 

since the innovation term T^"-* is an approximation in distribution of the limit (as 
m — > +oo) r.v. M , it is intuitive that the rate of convergence will increase as n gets 
larger. This is confirmed by the description of small sets C*^") in (22) (in the proof 
of Theorem 5). In fact, under (15) or (16), the Markov chain {Mm\'m > 0} is geo- 
metrically or uniformly ergodic since it satisfies a Foster-Lyapunov condition PV{x) := 
J^V{y)p{x,dy) < XV{x) + blQ(„){x) for a suitable function V, a small set C^") and con- 
stants b < +O0, < A < 1. In particular, the small sets C^"' generalize the corresponding 
small set C obtained in Theorem 1 in [15] which can be recovered by setting n = 1, that 
is, C ^[-KiX),K{X)] where 

_ l-A + l/(l + a)E[|yi,i|] 
^^^> A-a/(l + a) ■ 

Here the size of the smaU set C^") of {M^\m > 0} can be controlled by an addi- 
tional parameter n, suggesting the upper bounds of the rate of convergence of the chain 
{Mm\m > 0} depends on n too. 

However, if we would establish an explicit upper bound on the rate of convergence, we 
would need results like Theorem 2.2 in [30], or Theorems 5.1 and 5.2 in [29]. All these 
results need a minorization condition to hold for the rngth step transition probability 
p';nlix,A) := P(M^) G A\M^"^ = x) for any A G and a; G M, for some positive inte- 
ger mo and all x in a small set; in particular, if inf^g(^(„) f{z\x) >Pq^\z), where f{z\x) 

is the density of p^") {x, •) and Pq""* (z) is some density such that e{n) := /uPo"' (z) dz > 0, 
then 

p["\x,A)>e{n) ^V^dz = £(n)z/(A), A£R,x£C^''\ 
Ja 

where i/ is a probability measure on R. In order to check the validity of our intuition 
that the rate of convergence will increase as n gets larger, the function e{n) should be 
increasing with n in order to prove that the uniform error (when the support of the Yi 's is 
bounded) in total variation between the law of Mm'^ given Mq") and its limit distribution 
decreases as n increases. If fxM is the density of T'^"', which exists since, conditioning 
on Fi's, T^") is a random Dirichlet mean, then 

P^"\x,A)^ ( f{z\x)dz= [ P(0i2/+(l-0i)a;GA|r(")=y,M^")=.T)/j,(„,(y)dy. 
J A Jm 

Therefore, the density function corresponding to pj") (x. A) is 

fM ^ 1 f {z-xr-^iy-zr-\ f^-Ar , ^ , 
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/t(") (y) dy, if z < x, 

1 f\n-2n ,Na-l^ fz-{l-t)x 



B{a,n)J_^ (x-y)°+"-i 
1 /•+°° (z-x)"-i(y-z)''-i 



- ty-'fTi^, — '—^ At 



B{a, n) Jq \ t 

Unfortunately, the explicit expression of /ycn) , which for n = 1 reduces to the density of 
ao if it exists, is not simple; from Proposition 5 in [28] for instance, for y G M, 



/t(")(2/)= / /T<")(y;2/i,---,2/«)^(i'i,...,y„)(dyi,...,y„), 

where, when y for i = 1, . . . , n, 

(y; 2/1, • ■ • , Vn) = ^ n (i + t2(y -y)2)i/2 ( E arctan(i(y, - y)) \ di; 



here f(Yi....,Y'„) is the distribution of (Yi, . . . , Kri) which, by definition, can be recovered by 
the product rule F(y^^...,y„) (2/1, • • ■ , Vn) = Fy^ (y^-FVali-i (2/2; yi) ■ ■ ■ -Fy„|Yi,...,f„_i (2/ri; 2/i, • ■ • , 
y„_i) with Fi = and 

(2/; 2/i, • ■ • ,2/j-i) = ^^°_^ ^o(2y) + J H l(-oo,j;i(2/*)- 

However, some remarks on the asymptotic behavior of e{n) can be made under suitable 
conditions. Since, 

l/^,„, f ^-{^-i)A > ^^^^^ fz-{l~t)x 



t ' \ t J --.^ ■ y ^ 

if the support of y^'s is bounded (for instance equal to [0, 1]) and the derivative of /yCn) 
is bounded by some constant then, by Taylor expansion of Jti^) , we have 



' sup xf{l\i- t)»i"-v;,„„ ('-^\ it I iz 



B[a,n) ^gc-f") Jo \Jo V * 



(17) 



>— KI&'\X) I (l-t)°i"-2jt 

a + n Jq 

a + n 71 — 1 
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For a large enough no, if we fix A equal to some positive constant C which is grater than 
a/(a + n) for all n > uq, then K^'^'>{X) is bounded above by 

i-c + E[yi,i] 

C - a/{a + no) 

The second term in (17) is negligible with respect to the first term, which increase as n 
increases. As we mentioned, when the support of the K^'s is bounded, from Theorem 16.2.4 
in [22] it follows that the error in total variation between the mth transition probability 
of the Markov chain {Mm '' , m > 0} and the limit distribution ^ is less than (1 — e(ri))™. 
This error decreases for n increasing greater than no. 

So far we have provided only some qualitative features on the rate of convergence; 
however, the derivation of the explicit bound of the rate of convergence of ' to M 
for each fixed n, via pg"^ and e{n), is still an open problem. Some examples confirm 

our conjecture that the convergence of the Markov chain {Mm\rn > 0} improves as n 
increases. Nonetheless, we must point out that simulating the innovation term T^"^ for n 
larger than 1 will be more computationally expensive, and also that this cost will be 
increasing as n increases. In fact, if n is greater that one, 2n — 1 more r.v.s must be 
drawn at each iteration of the Markov chain (n — 1 more from the Polya sequence and n 
more from the finite-dimensional Dirichlet distribution). Moreover, we compared the total 
user times of the R function simulating {Mm ^ , m = 0, . . . , 500}. We found that all these 
times were small, of course depending on ao, but not on the total mass parameter a (all 
the other values being fixed) . The total user times when n = 2 were about 50% greater 
than those for n = 1, while they were about 5, 10 and 50 times greater when n = 10, 20 
and 100, respectively, for a number of total iterations equal to 500. From the following 
examples, we found that values of n between 2 and 20 are a good choice between a fast 
rate of convergence and a moderate computational cost. 

Example 1. Let ao be a Uniform distribution on (0, 1) and let a be the total mass. In 
this case E[|yij |] = 1/2 so that for any fixed integer n, the chain will be geometrically 
ergodic; moreover, it can be proved that (0, 1) is small so that the chain is uniformly 
ergodic. When a — 1, [15] showed that the convergence of {Mm,m > 0} is very good 
and there is no need to consider the chain with n > 1. We consider the cases a = 10, 
50 and 100, and for each of them we run the chain for n = 1, 2, 10 and 20. We found 
that the trace plots do not depend on the initial values. In Figure 1, we give the trace 
plots of Mm'' when Mq"-* = 0. Observe that convergence improves as n increases for any 
fixed value of a; however the improvement is more glaring from the graph for large a. 
When a = 100 the convergence of the chain for n = 1 seems to occur at about m = 350, 
while for n = 20 the convergence is at about a value between 50 and 75. For these values 
of n, the total user times to reach convergence was 0.038 seconds for the former, and 
0.066 seconds for the latter. Moreover, the total user times to simulate 500 iterations 
of {Mt\m> 0} were 0.05, 0.071, 0.226, 0.429, 2.299 seconds when n = 1,2, 10,20, 100, 
respectively. 




Figure 1. Traceplots of the Markov chain {M^ ,m > 0} with qq the Uniform distribution on 
(0, 1), a = 10,50, 100 and n = 1 (solid blue line), n = 2 (dashed red line), n = 10 (dotted black 
line) and n = 20 (dot-dashed violet line). 



This behaviour is confirmed in the next example, where the support of the measure a 
is assumed to be unbounded. 

Example 2. Let ao be a Gaussian distribution with parameter (0,1) and let a = 10. 
The behavior of M„i,m > has been considered in [15]. Figure 2 displays the trace plots 
of Mm"^ , m > for three different initial values (A^g"^ = —3, 0, 3), with n = 1, 10, 20. Also 
in this case, it is clear that the convergence improves as n increases. As far as the total 
user times are concerned, we drew similar conclusions than in Example 1. 

The next is an example in the mixture models context. 

Example 3. Let us consider a Gaussian kernel k{x, 9) with unknown mean 9 and known 
variance equal to 1. If we consider the random density f{x) = J^k{x,9) dP{9), where P 
is a Dirichlet process with parameter a, then, for any fixed x, f{x) is a random Dirichlet 
mean. Therefore, if we consider the measure- valued Markov chain {Pm\'rn > 0} defined 
recursively as in (12), we define a sequence of random densities {frn\x),'m > 0}, where 
f^\x) := J^kix,9)dP!n\e) = emYJl&{x,Ym,i) + (1 " In cach panel 

of Figure 3, we drew fm\x) for different values of m when n is fixed. In particular, we 
fixed ao to be a Gaussian distribution with parameter (0, 1), and let a = 100; in this case, 
since the "variance" of P is small, the mean density E[f]{x) = /jj k{x, 9)ao{d9) (Gaussian 
with parameter (0,2)) will be very close to the random function /(x), so that it can be 




Figure 2. Traceplots of the Markov chain {Mm ,rn > 0} with ao the Gaussian distribution 
with parameter (0, 1), a = 10, n = 1, 10, 20 and Afp^^ = —3 (dashed red hue), A/q^^ = (solid 
blue line) and A/q^' = 3 (dotted black line). 



considered an approximation of the "true" density f{x). From the plots, it is clear that 
the convergence improves as n increases: when n = 1, only /}ooo(2^) close enough to the 
mean density E[f]{x), while, if n = 20, /}gQ^(x), as well as the successive iterations, is 
a good approximation of f{x). In any case, observe that even if the "true" density f{x) is 
unknown, as when a = 1, the improvement (as n increases) is clear as well; see Figure 4, 
where 5 draws of ,fm\x), m = 1, 100, 1000, are plotted for different values of n. 



4. Concluding remarks and developments 

The paper [12] constitutes, as far as we know, the first work highlighting the interplay 
between Bayesian nonparametrics on the one side and the theory of measure-valued 
Markov chains on the other. In the present paper, we have further studied such interplay 
by introducing and investigating a new measure- valued Markov chain {Qn, ^ 1} defined 
via exchangeable sequences of r.v.s. Two applications related to Bayesian nonparametrics 
have been considered: the first gives evidence that {Qn,n > 1} is strictly related to the 
Newton's algorithm of the mixture of Dirichlet process model, while the second shows 
how {Qn,n- ^1} can be applied in order to a define a generalization of the Feigin-Tweedie 
Markov chain. 

An interesting development consists in investigating whether there are any new applica- 
tions related to the Feigin-Tweedie Markov chain apart from the well-known application 
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in the field of functional linear functional of the Dirichlet process (see, e.g., [14]). The 
proposed generalization of the Feigin-Tweedie Markov chain represents a large class of 
measure- valued Markov chains {{Pm\'m > 0},n € N} maintaining all the same proper- 
ties of the Feigin-Tweedie Markov chain; in other terms, we have increased the flexibility 
of the Feigin-Tweedie Markov chain via a further parameter n £ N. We believe that 
a number of different interpretations for n can be investigated in order to extend the 
applicability of the Feigin-Tweedie Markov chain. 

In this respect, an intuitive and simple extension is related to the problem of 
defining a (bivariate) vector of measure- valued Markov chains {(Pi"'\ Pi"'^), to > 0}, 
where, for each fixed m, {Pm^\ Pm^'') is a vector of dependent random probabilities, 
ni < being fixed positive integers. Marginally, the two sequences {Pm'^\m > 0} and 
{Pm^\m > 0} are defined via the recursive identity (12); the dependence is achieved 
using the same Polya sequence (Y^_l\. . . , Ym^n\,- ■ ■ yYm^nl) and assuming dependence 
in (6'^^^\6'^^'^) or between (gi" i\ • ■ . , Qm.nj ) and (9^,1'' ' ■ • ■ ' ) • For instance, if, for 
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each m, Z„i, Zm, . . . , Zm, . . . , Zm are independent r.v.s, Zm with an Exponential dis- 
tribution with parameter a, Zm with an Exponential distribution with parameter 1, we 

could define 0^?^^ := Y.Zi zH} /{Z^ + T^Zi ^™ ), E:=i Z^/{Zm + E:=i zlS). 

Of course, the dependence is related to the difference between ni and 712. Work on this 
is ongoing. 



Appendix 

A.l. Weak convergence for the Markov chain {Qnin > 1} 

A proof of the weak convergence of the sequence Qn = Ei r.p.m.s on X to 

a Dirichlct process P is provided here, when the Yj 's are a Polya sequence with param- 
eter measure a. The result automatically follows from Theorem l(ii), but this proof is 



Measure-valued Markov chains 



19 



interesting per se, since we use a combinatorial technique; moreover an explicit expression 
for the moment of order (ri, . . . , r^) of the /s-dimensional Polya distribution is obtained. 

Proposition. Let Qn defined in (5), where {Yj,] > 1} are a Polya sequence with pa- 
rameter a. Then 

Qn ^ 

where P is a Dirichlet process on X with parameter a. 

Proof. By Theorem 4.2 in [18], it is sufficient to prove that for any measurable partition 
Bi,...,Bk of X, 

(g„(i?l), . . . , Qn{Bk)) =^ (P(i?i), . . . , PiBk)), 

characterizing the distribution of the limit. For any given measurable partition _Bi, . . . , Bk 
of X, by conditioning on Yi, . . . , y„, it can be checked that (Qn(-Bi), . . . , Qn{Bk-i)) is dis- 
tributed according to a Dirichlet distribution with empirical parameter (X]i<i<n ^Yi {Bi), 

■ ■ ■ ^J2l<^<rJYABk-l),J2l<^<n^YABk)), aild 



W: K,Gi3i}=ji,...,#{*: e B^} = Jk) 



,Ji--\]k 



(18) 



(a)n 



tl 



where (ji, . . . , jfc) G V^^l with := {{j,, . . . G {0, . . . : Y.l<^<kh = n}. For 
any fc-uple of nonnegative integers (ri,...,rfc), we are going to compute the limit, for 

7H> +0O, of the moment 



E 



fc-i 



iQn{Bl)y' ■ ■ ■ {QniBk-lW"-' 1 - I] QniB.) 



E 



(ii,...jfc)ei5i°i 



n 

.h---jk 



(a(-Bi))jiti • ■ • (a(Sfc))j,ti (ji)nn ■ • ' {jk)r^n 



(19) 



(a)„- 



fi 



where in general (a;)„-fQ denotes the Pochhammer symbol for the nth factorial power of x 
with increment a, that is {x)n^a ■— Ylo<i<n-ii^ + ^^)- ^^^^ show that, as n — ;> +00, 



E 



{Qn{Bl)r ■ ■ ■ {QniBk-lW"-' 1 - E 

V i=i y 

/ fe-i 

{P{B,)r---iP{Bk-i)y''-^il-Y.PiB,, 



■E 



where P is a Dirichlet process on X with parameter measure a, that is, the r.v. 
{P{Bi), . . . ,P{Bk-i)) has Dirichlet distribution with parameter {a{Bi), . . . ,a{Bk-i), 
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a{Bk))- This will be sufficient to characterize the distribution of the limit Q* , because 
of the boundedness of the support of the limit distribution. First of all, we prove the 
convergence for k — 2, which corresponds to the one-dimensional case. In particular, we 
have 

E[(g„(i?i)r(i-(Qn(Bi)))'1 

1 ( \ (Q(-Si))jiti('a(-B2))j2tW ■ ^ 

ri ti r2 t2 

= 7-T ^ \s{riM)\ X! '^(^i^'Si) ^ \s{r2M)\ 51 S{t2,S2) 

W(ri+r2m ^^=1 ^, = 1 



7, ■ • r~\ uijsaiU2js2ii> 



where {x)nii '■= (— !)""■(— x)„-|-i and s(-, •) and S{-, •) are the Stirling number of the first 
and second kind, respectively. Let us consider the following numbers, where si, S2 are 
nonnegative integers and n = 1,2,..., 

r{sus2).^ sr f \ HBl))nnHB2))32n (- ^ / ■ ^ 



and prove they satisfy a recursive relation. In particular, 

{SUS2) _ >r- f^+A ('^(-gl))3ltl(Q^(-^2))32tl 



C„+i - 2^ -T Uljsi;iU2js2il 

^ \.?1,.?2/ (a)(n+l)tl 



^ ^n + l Va(gi))(,-,+i)ti(Q(-B2))(„-j,)ti ,, , , . 

^ + (a)(«+i)ti 



Ji=0 

^ (?i + + 51-1) ^(s,-i.s,) , " + (^(^1,^2) 

(a + n) " (a + n) " 

so that the following recursive equation holds 

= (" + i)H^i) + ^i-i) c(^.-M.) + -!i±lc(--=). (20) 

(a + n) " (a + n) " ^ ^ 

Therefore, starting from Cn''^'^ = 1, Cn''^'^ = na{B2) / a, we have 

(jii.i) _ + I)q(-^i) (^(0.1) TT J + 1 

^ (a + i) J--'- a + j 
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_ T{n + l)a{Bi) r(a-M) (o,i) 



r(a + n) ^r(z + i) 
r(n + l)a(Bi) r(a + i) ia{B2) a{Bi)a{B2){n)2ii 



E 



r(a + n) + a (a)2ti 



and by (20) wc obtain c'}^^'"'^^ = (a(Bi))siti("(^2))s2Ti(")(si+s2);i/(a)(si+s2)ti- ^h^s, 
lim E[(i/„(Bi)r(l-(^n(i?i)))'"1 



ri ti r2 t2 ^(si,S2) 

= E E "^(^I'^i) E 1^(^2,^2)1 ^ 5(^2,52) lim 

ti=0 .1=0 t2=0 .2=0 "^+°° W(ri+r2)tl 

/ Me/ M / Mc/ x("(-Sl))ritl(a(-B2))r2tl 

= |s(ri,ri)|5'(ri,ri)|s(r2,r2)|y(r2,r2) 1—-^ 

^ {a{B,))r,ti{aiB2))r,n 

(")(ri+r2)tl 

The last expression is exactly E[{P{Bi))''' {I - P{Bi)Y''], where P{Bi) has Beta distri- 
bution with parameter (a(i3i), a(i?2))- 

This proof can be easily generalized to the case k > 2. Analogously to the one- 
dimensional case, we can write 



E 



fc-i 



1 f ^ \ (°'(-^i))jiti ■ • ■ (Q(-Sfc)kti ( ■ ^ 

\n ...o ] ivi Uljritl---Ufcjrfcn 



Oi,...,ifc)ec<'" 



k,n 

tl Tk tk 



1—^ E E S{ti,si)--- ^ \s{rk,tk)\ ^ S{tk,Sk) 

W(ri+-+rfc)tl t^^o ^^^0 4^,^o ^^.^0 

" \ (Q(-Si))jiti---(«(-Sfc)kn, 



and, as before, define 



E/ \ v">,-pi;;3iti • • • K^K-^k))]kn ( ■ ^ 
■ ] -r-\ Uijsiii---Ufcjsai 



and prove they satisfy a recursive relation. Observe that ci''^' ' ''"'''/('^)(riH Vrk)\i is 

the moment of order (ri,...,rfc) of the fc-dimensional Polya distribution by definition. 
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Therefore, for > 2 



^ \jk) {aUi 



3k=Q 



X 



E 



X (.7i).siii---(jfc-i)sj,_i 
^ /n^ (a(Bfc))j,ti(a-a(5fe))(n-jOnOfc)'!ai 



ii 



X 



jkj (a)„ti 
(a(-Bi))siti • • ■ ("(^fc-i))s.-in(" - Jfc)(si+-+.sfc_iHi 



(a - a(Bfe))( 



Si+--- + Sfc_i)tl 



where the last equaUty foUows by induction hypothesis (we aheady proved the base case 
fc = 2 in (20)). Then, fohowing the same steps of the one-dimensional case, we can recover 
a recursive equation for Cn^ 

C(-i,--s.) ^ in + l){aiBk) + Sk - l) ^(,,,...,s,_,,s,-i) , /21) 
a + n a + n 

Starting from Cn''"'^^ = 1, (Jn''"'^'^'^''"'^'' = na{Bj)/a and 

n — 1 / . , \ / 7-, \ n— 1 



,...,1) (» + l)a(^fc) ^(i,...,i,o) TT j + 1 

^ T{n + 1) (a(gi)).,n ■ ■ ■ (a(gfc-i))s,_iti^(-gfc) iV+z) ^(i,...,i,o) 
r(a + n) (a-a(Bfc))(,i+...+,,_^)^i 

r(n + 1) (a(Bi)),,ti ■ • ■ (a(Bfe_i)),,_,ti«(^fc) 



r(a + n) (a-Q!(i?fc))( 



,n+---+sfc-i)ti 



fe+i r(a + ^) (q - a(-Bfc))in(~0(fc-i)ti 



i=0 



r(i + ») (a 



r(-a + Q;(Bfc) - i + l)r(-fc -a + 2) 
^ r(-a - i + l)r(-a + a{Bk) + 2 - fc) 
q;(Si) • • •a(Sfc)(rt)fe;i 



(«)(feti) 

by repeated application of (21), wc obtain 



(j{s^.,...,s^) ^ (a(^i))^in ■ ■ • («(-Sfc))s;,n 

(a)(si+...+sfc)ti 
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Thus, 



lim E 

n— ^+00 



fc-1 



iQniBl)Y' ■ ■ ■ {QniBk-lW"-' 1 - ^ QniB,) 



\s{ri,ti)\ ^ 5(^1,51) • ■ • ^ \s{rk,tk)\ ^ ^(tfcjSfc)^^!!!!! 



^(Sl ,...,Sfc) 



ti=0 



si=0 



|s(ri,ri)|S'(ri,ri) • • • \s{rk,rk)\S{rk,rk) 
{a{Bi))r,^i---{a{Bk))r,n 



,,^^0 ■ °° ('^)(ri+--+rfc)tl 

(a(-Bi))r-iti ••■(a(^fc))r,ti 



(«) 



(")(ri+---+rfc)tl 

: E (P(i?l))'-l • • • {PiBk^i))^"-' ( 1 - E ^(^') 

\ i=l / 

where P is a Dirichlct process with parameter a. 



□ 



A. 2. Solution of the distributional equation 

Here, we provide an ahernativc proof for the sohition of the distributional equation (3) 
introduced by [11]. 

Theorem. For any fixed integer n > 1, the distributional equation 

n 
i=l 

has the Dirichlet process with parameter a as its unique solution, assuming the indepen- 
dence between P'"', 0, gi"') and (Yi,...,y„) in the right-hand side. 

Proof. From Skorohod's theorem, it follows that there exist n independent r.v.s 
^1, . . . , ^„ such that has Beta distribution with parameter (1, n — i) for i = 1, . . . , n 
and g^"'' =^1 and q'f^^ = ^iY[i<j<i-ii^ ^ ^j) ^'^^ i = 2,...,n; in particular, by a sim- 
ple transformation of r.v.s, it follows that (qj"^ . . . , gi"') is distributed according to 
a Dirichlet distribution function with parameter Further, since = 1 a.s., 

then X^KjXri ^i"' ~ 1 ^^'^ '^^^ tie verified by induction that 

i=l i=l 
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Let Bi,...,Bk be a measurable partition of X. We first prove that conditionally on 
Fi, . . . , Yn, the finite dimensional distribution of the r.p.m. J2i<i<n li^^'^^Yi is the Dirichlet 
distribution with the empirical parameter {J2i<i<n (-Si), . . . , X]i<i<n i^k))- Actu- 
ally, since 



/ n n \ 



KiiYiGBi i-.YiEB 




then, conditionally on the r.v.s Yi,...,y„, the r.v. (X^rFieSi^i ' ■ ■ ■ ^'l2i:YieB^ 

ql ') is 

distributed according to a Dirichlet distribution with parameter (rii,..., rifc), where 
n,; = ^j^^^^jj (_B,;) for i = l,...,fc. Conditionally on Yi,...,y„, the finite dimen- 
sional distributions of the right-hand side of (3) arc Dirichlet with updated parameter 

{{a{Bi) + Y,i<i<n ^yABi),- ■ ■ , C({Bk) + Y.i<i<n ^Y (Bk))- This argument verifies that the 
Dirichlet process with parameter a satisfies the distributional equation (3). This solution 
is unique by Lemma 3.3 in [33] (see also [37], Section 1). □ 



A. 3. Proofs of the theorems in Section 3 



Proof of Theorem 3. The proof is based on the "standard" result that properties (e.g., 
weak convergence) of sequences of r.p.m.s can be proved via analogous properties of the 
sequences of their linear functional. 

First of all, we prove that if 5 : X i— > R is a bounded and continuous function, then 
{Gm "* ,'rn>0} with Gm "* := 9{x)Pm^ (dx) is a Markov chain on M with unique invariant 
measure lig. From (12), it follows that {G^^\m > 1} is a Markov chain on M restricted to 
the compact set [— sup^ |(7(a;)|,supx |<?(a^)|] and it has at least one finite invariant measure 
if it is a weak Feller Markov chain. In fact, for a fixed y G M 

liminfP(GL")<y|G(:li=x) 

= liminf P (0™ 9r",W(^"M) < y - ^(1 - ^m) ) 

> f liminf P fv (F™,) < 1^1^}^] p(0,„ g dz) 

J{o,i) ' ^ / 
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E iS^Yr^,) < ^ 'A nom e d.) 

'(0^1) 'J 

= P(G(^)<y|GL"li=x*), 

since the distribution of X)i<i<n 9m iffl^m,*) most a countable numbers of atoms 

and 9rn is absolutely continuous. From Proposition 4.3 in [36], if we show that {Gm\ 
m > 0} is (/)-irreducible for a finite measure (f), then the Markov chain is positive recurrent 
and the invariant measure Ilg is unique. Let us consider the following event E := {^1,1 = 
5^1,2 = • ■ • = 5^1, n}. Then for a finite measure (j) we have to prove that if (j){A) > 0, then 
P(gJ"^ e A|g[,"^) > for any g[,"\ We observe that 

P(g1"^ e A|g[,"^) = P(Gi"^ G ^|G[,"\i;)P(£;|G^"') + P(g1"^ e yl|G[,"\£;'=)P(i;"|G[,"^) 

Therefore, since P(£') > 0, using the same argument in Lemma 2 in [12], we conclude 
that P(gJ"^ e A|g[)"\£') > for a suitable measure such that (^(A) > 0. Finally, we 
prove the aperiodicity of {Gm\rn > 0} by contradiction. If the chain is periodic with 
period d > 1, the exist d disjoint sets 2?!, . . . , Dd such that P(Gm'' G Di^i\G^^^_^ = x) = 1 
for all X & Di and for i = 1, . . . , c? — 1. This implies P(2;X)"=i 1m\90^m,i) + (1 — z)x G 
_Di+i) = 1 for almost every z with respect to the Lebesgue measure restricted to (0,1). 
Thus, P(X]i"=i 'lm\90^mA) G i'i+i) = 1 for i = 0, . . . , d — 1 . For generic a and g, this is in 
contradiction with the assumption d> 1. By Theorem 13.3.4(ii) in [22], Gm'' converges in 
distribution for Llg-almost all starting points Gq"''. In particular, {GmKm > 0} converges 

(n) 

weakly for Ilg-almost all starting points Gq . 

From the arguments above, it follows that, for all g bounded and continuous, there 
exists a r.v. G such that Gm^ ^ G as m — ?► +C!0 for Ilg-almost all starting points Gq"'' . 
Therefore, for Lemma 5.1 in [18] there exists a r.p.m. P such that Pi"^ P 
and G = J^g{x)P{-,dx) for all g G G(]R). This implies that the law of P is an invariant 
measure for the Markov chain {Pm \ m > 0}. Then, as m — +oo, 

g{x)dPi:H;d^)^ I 9{^)P{-Ax) 



and the limit is unique for any g G G(R). Since for any random measure C,i and C2 we 
know that Qi = C2 if and only if ^^g{x)C,i{-,Ax) ~ ^^g{x)C,2{','dx) for any g G G(R) (see 
Theorem 3.1. in [18]), the invariant measure for the Markov chain {Pm \ m > 0} is unique. 
By the definition of {Pm\m > 0}, it is straightforward to show that the limit P must 
satisfy (3) so that P is the Dirichlet process with parameter a. □ 
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Proof of Theorem 4. The proof is a straightforward adaptation of the proof of Theo- 
rem 2 in [12], using 



log 1 



^iE'?M5(n,,) + (l-ei)G^' 



(n) 



< Elog(l + +log(l + (1 - diWo 



(»)i 



instead of their inequality (8). 



□ 



Proof of Theorem 5. As regards (i), given the definition of stoehastically monotone 
Markov chain, we have that for zi < Z2, s G R, 



pr(zi, (-00, s))^F[0,J2 + (1 - ^l)^l < s 



1=1 



> pj^^i ^ + (1 - 0i)z2 < aj 

= Pl"^(z2,(-00,s)). 

As far as (ii) is concerned, we first prove that, under condition (15), the Markov chain 
{Mm\'rn > 0} satisfies the Foster-Lyapunov condition for the function V{x) = 1 + |a;|. 
This property implies the geometric crgodicity of the {Mm\'rn > 0} (see [22], Chap- 
ter 15). We have 



pV{x)= / (1-K|y|)p(x,dy) = l+E 

n 

< 1 + E[9i]Y,E[\q[';.^Y,,\] + \x\E[l - 01 

n 



< 1 



71 + a 



n + a 



X =1- 



n\Yi.i\] + — \x\ 



n + a 



n + a 



Therefore, we are looking for the small set C^"-* such that the Foster-Lyapunov condition 
holds, that is, a small set C^"-* such that 



1 + ^E[|Yi,i |] + < A(l + \x\) + 61cw {x) 

n + a n + a 

for some constant 6 < +oo and < A < 1. If C^") = [-i^(")(A), A'(")(A)], where 

j^in)(.. _ l-A + n/(n + a)E[|yi,i|] 
A — a/ (n + a) 



(22) 
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then, condition (22) holds for ah 



a 



n + a 



b>l- X+ 



n + a 



\yi,i\ 



As in the proof of Theorem 3, we can prove that the Markov chain {Mm\m > 0} is weak 
Feher; then, since 

C(n) 

is a compact set, it is a smah set (see [35]). As regards (iii), the 
proof fohows by standard arguments. See, for instance, the proof of Theorem 1 in [15]. □ 

Proof of Remark 1. As we have aheady mentioned, the geometric ergodicity fol- 
lows if a Foster-Lyapunov condition holds. Let V{x) = 1 + \x\'^; then, if E[(l + 

I X]i<i<n < +00; it is straightforward to prove that the Foster-Lyapunov con- 

dition lPV{x) < XV{x) + felp(„) {x) holds for some constant b < +cx}, and A such that 



E[(l 



T{a + s)T{a + n) 
r(a)r(a + s + n) 



<A<1 



and for some compact set C^"). Of course (16) implies E[(l + | X!i<i<n ^ 
+oo; in fact, conditioning on the random number N of distinct values Yii, . . . in 
Yi_i, . . . , Yi „, 1 < < ri, we have 



N 



<E9Min:J<niax{|yi:i|,...,|n>|}. 



Since {^1*1, . . . , Y* j^} are independent and identically distributed according to ao, then 



E 



i=l 



< 



+ 00 



y'NiAo{y)y'-'aoidy)<N 



-i-CX3 



2/'ao(d2/)<nE[|yi,,|-^]<+oo, 



where ^0 is the distribution corresponding to the probability measure ao, and this is 
equivalent to E[(l + | Ei<^<„ < Q 
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